Babel-访问者模式

Visitor（访问者）

当 Babel 处理一个节点时，是以访问者的形式获取节点信息，并进行相关操作，这种方式是通过一个 visitor 对象来完成的。

在 visitor 对象中定义了对于各种节点的访问函数，这样就可以针对不同的节点做出不同的处理。

我们编写的 Babel 插件其实也是通过定义一个实例化 visitor 对象处理一系列的 AST 节点，来完成我们对代码的修改操作。

例子

举个栗子：我们想要处理代码中用来加载模块的import命令语句

import { Ajax } from '../lib/utils';

visitor: {
  Program: {
    enter(path, state) {
      console.log('start processing this module...');
    },
    exit(path, state) {
      console.log('end processing this module!');
    }
  },
  ImportDeclaration (path, state) {
    enter(path, state) {
      console.log('start processing ImportDeclaration...');
      // do something
    },
    exit(path, state) {
      console.log('end processing ImportDeclaration!');
      // do something
    }
  },
   // 拓展：更高级的, 使用同一个方法访问多种类型的节点
  "ExportNamedDeclaration|Flow"(path) {}
}

当把这个插件用于遍历中时，每当处理到一个 import 语句，即 ImportDeclaration 节点时，都会自动调用ImportDeclaration() 方法，这个方法中定义了处理 import 语句的具体操作。

值得注意的是，AST的遍历采用深度优先遍历。所以当创建访问者时实际上有两次机会来访问一个节点。

─ Program.enter() 
  ─ ImportDeclaration.enter()
  ─ ImportDeclaration.exit()
─ Program.exit()

Path（路径）

从上面的visitor对象中，可以看到每次访问节点方法时，都会传入一个 path 参数。

Path 是表示两个节点之间连接的对象。

在某种意义上，路径是一个节点在树中的位置以及关于该节点各种信息的响应式 Reactive 表示。

当你调用一个修改树的方法后，路径信息也会被更新。

Babel 帮你管理这一切，从而使得节点操作简单，尽可能做到无状态。

这个对象不仅包含了当前节点的信息，也有当前节点的父节点的信息，同时也包含了添加、更新、移动和删除节点有关的其他很多方法。具体地，Path对象包含的属性和方法主要如下：

── 属性      
	- type 节点类型
  - node   当前节点
  - parent  父节点
  - parentPath 父path
  - scope   作用域
  - context  上下文
  - ...
── 方法
  - get   当前节点
  - findParent  向父节点搜寻节点
  - getSibling 获取兄弟节点
  - replaceWith  用AST节点替换该节点
  - replaceWithMultiple 用多个AST节点替换该节点
  - insertBefore  在节点前插入节点
  - insertAfter 在节点后插入节点
  - remove   删除节点
  - ...

具体的可以查看babel-traverse。

例子

继续上面的例子，看看 path 参数的 node 属性包含哪些信息：

visitor: {
  ImportDeclaration (path, state) { 
    console.log(path.node);
    // do something
  }
}

打印结果如下：

Node {
  type: 'ImportDeclaration',
  start: 5,
  end: 41,
  loc: 
   SourceLocation {
     start: Position { line: 2, column: 4 },
     end: Position { line: 2, column: 40 } 
	 },
  specifiers: // 表示 import 导入的变量组成的节点数组
   [ Node {
       type: 'ImportSpecifier',
       start: 14,
       end: 18,
       loc: [SourceLocation],
       imported: [Node], // 表示从导出模块导出的变量
       local: [Node] } ], // 表示导入后当前模块的变量 比如 import { Ajax as ajax } from '../lib/utils'; local 为 ajax
  source:  // 表示导出模块的来源节点
   Node {
     type: 'StringLiteral',
     start: 26,
     end: 40,
     loc: SourceLocation { start: [Position], end: [Position] },
     extra: { rawValue: '../lib/utils', raw: '\'../lib/utils\'' },
     value: '../lib/utils'
    }
}

State（状态）

State 是 visitor 对象中每次访问节点方法时传入的第二个参数。

简单来说，state 就是一系列状态的集合，包含诸如当前 plugin 的信息、plugin 传入的配置参数信息，甚至当前节点的 path 信息也能获取到，当然也可以把 babel 插件处理过程中的自定义状态存储到state对象中。

副作用的处理

实际上访问者的工作比我们想象的要复杂的多，上面示范的是静态 AST 的遍历过程。而 AST 转换本身是有副作用的，比如插件将旧的节点替换了，那么访问者就没有必要再向下访问旧节点了，而是继续访问新的节点, 代码如下。

traverse(ast, {
  ExpressionStatement(path) {
    // 将 `console.log('hello' + v + '!')` 替换为 `return ‘hello’ + v`
    const rtn = t.returnStatement(t.binaryExpression('+', t.stringLiteral('hello'), t.identifier('v')))
    path.replaceWith(rtn)
  },
}

上面的代码, 将console.log('hello' + v + '!')语句替换为return "hello" + v;, 下图是遍历的过程：

我们可以对 AST 进行任意的操作，比如删除父节点的兄弟节点、删除第一个子节点、新增兄弟节点… 当这些操作’污染’了 AST 树后，访问者需要记录这些状态，响应式(Reactive)更新 Path 对象的关联关系, 保证正确的遍历顺序，从而获得正确的转译结果。

Scopes（作用域）

这里的作用域其实跟 js 说的作用域是一个道理，也就是说 babel 在处理 AST 时也需要考虑作用域的问题，比如函数内外的同名变量需要区分开来。

例子

举一个栗子：比如你要将 add 函数的第一个参数 foo 标识符修改为a，就需要递归遍历子树，查出foo标识符的所有引用, 然后替换它:

const a = 1, b = 2
function add(foo, bar) {
  console.log(a, b)
  return foo + bar
}

traverse(ast, {
  // 将第一个参数名转换为a
  FunctionDeclaration(path) {
    const firstParams = path.get('params.0')
    if (firstParams == null) {
      return
    }

    const name = firstParams.node.name
    // 递归遍历，这是插件常用的模式。这样可以避免影响到外部作用域
    path.traverse({
      Identifier(path) {
        if (path.node.name === name) {
          path.replaceWith(t.identifier('a'))
        }
      }
    })
  },
})

console.log(generate(ast).code)
// function add(a, bar) {
//   console.log(a, b);
//   return a + bar;
// }

然后就发现，替换成 a 之后, console.log(a, b) 的行为就被破坏了。所以这里不能用 a，得换个标识符, 譬如c。这里需要借助 Scope 对象来处理。

Scope 对象

在Babel中，使用Scope对象来表示作用域。我们可以通过Path对象的scope字段来获取当前节点的Scope对象。它的结构如下:

{
  path: NodePath;
  block: Node;         // 所属的词法区块节点, 例如函数节点、条件语句节点
  parentBlock: Node;   // 所属的父级词法区块节点
  parent: Scope;       // ⚛️指向父作用域
  bindings: { [name: string]: Binding; }; // ⚛️ 该作用域下面的所有绑定(即该作用域创建的标识符)
}

Scope 对象和 Path 对象差不多，它包含了作用域之间的关联关系(通过 parent 指向父作用域)，收集了作用域下面的所有绑定(bindings), 另外还提供了丰富的方法来对作用域仅限操作。

bindings 属性

我们可以通过 bindings 属性获取当前作用域下的所有绑定(即标识符)，每个绑定由 Binding 类来表示：

export class Binding {
  identifier: t.Identifier;
  scope: Scope;
  path: NodePath;
  kind: "var" | "let" | "const" | "module";
  referenced: boolean;
  references: number;              // 被引用的数量
  referencePaths: NodePath[];      // ⚛️获取所有应用该标识符的节点路径
  constant: boolean;               // 是否是常量
  constantViolations: NodePath[];
}

例子2

有了 Scope 和 Binding, 现在有能力实现安全的变量重命名转换了。为了更好地展示作用域交互，在上面代码的基础上，我们再增加一下难度：

const a = 1, b = 2
function add(foo, bar) {
  console.log(a, b)
  return () => {
    const a = '1' // 新增了一个变量声明
    return a + (foo + bar)
  }
}

现在要重命名函数参数 foo, 不仅要考虑外部的作用域, 也要考虑下级作用域的绑定情况，确保这两者都不冲突。

上面的代码作用域和标识符引用情况如下图所示:

// 用于获取唯一的标识符
const getUid = () => {
  let uid = 0
  return () => `_${(uid++) || ''}`
}

const ast = babel.parseSync(code)
traverse(ast, {
  FunctionDeclaration(path) {
    // 获取第一个参数
    const firstParam = path.get('params.0')
    if (firstParam == null) {
      return
    }

    const currentName = firstParam.node.name
    const currentBinding = path.scope.getBinding(currentName)
    const gid = getUid()
    let sname

    // 循环找出没有被占用的变量名
    while(true) {
      sname = gid()

      // 1️⃣首先看一下父作用域是否已定义了该变量
      if (path.scope.parentHasBinding(sname)) {
        continue
      }

      // 2️⃣ 检查当前作用域是否定义了变量
      if (path.scope.hasOwnBinding(sname)) {
        // 已占用
        continue
      }

      //  再检查第一个参数的当前的引用情况,
      // 如果它所在的作用域定义了同名的变量，我们也得放弃
      if (currentBinding.references > 0) {
        let findIt = false
        for (const refNode of currentBinding.referencePaths) {
          if (refNode.scope !== path.scope && refNode.scope.hasBinding(sname)) {
            findIt = true
            break
          }
        }
        if (findIt) {
          continue
        }
      }
      break
    }

    // 开始替换掉
    const i = t.identifier(sname)
    currentBinding.referencePaths.forEach(p => p.replaceWith(i))
    firstParam.replaceWith(i)
  },
})

console.log(generate(ast).code)
// const a = 1,
//       b = 2;

// function add(_, bar) {
//   console.log(a, b);
//   return () => {
//     const a = '1'; // 新增了一个变量声明

//     return a + (_ + bar);
//   };
// }

上面的例子虽然没有什么实用性，而且还有Bug(没考虑label)，但是正好可以揭示了作用域处理的复杂性。

generateUid 方法

Scope 对象提供了一个generateUid方法来生成唯一的、不冲突的标识符。

traverse(ast, {
  FunctionDeclaration(path) {
    const firstParam = path.get('params.0')
    if (firstParam == null) {
      return
    }
    let i = path.scope.generateUid('_') // 也可以使用generateUid
    path.scope.rename(firstParam.node.name, i)
  },
})

应用

作用域操作最典型的场景是代码压缩，代码压缩会对变量名、函数名等进行压缩… 然而实际上很少的插件场景需要跟作用域进行复杂的交互。

参考文章

深入Babel，这一篇就够了

深入浅出 Babel 上篇：架构和原理 + 实战